229 research outputs found
Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress
Time series anomaly detection has been a perennially important topic in data
science, with papers dating back to the 1950s. However, in recent years there
has been an explosion of interest in this topic, much of it driven by the
success of deep learning in other domains and for other time series tasks. Most
of these papers test on one or more of a handful of popular benchmark datasets,
created by Yahoo, Numenta, NASA, etc. In this work we make a surprising claim.
The majority of the individual exemplars in these datasets suffer from one or
more of four flaws. Because of these four flaws, we believe that many published
comparisons of anomaly detection algorithms may be unreliable, and more
importantly, much of the apparent progress in recent years may be illusionary.
In addition to demonstrating these claims, with this paper we introduce the UCR
Time Series Anomaly Archive. We believe that this resource will perform a
similar role as the UCR Time Series Classification Archive, by providing the
community with a benchmark that allows meaningful comparisons between
approaches and a meaningful gauge of overall progress
FastDTW is approximate and Generally Slower than the Algorithm it Approximates
Many time series data mining problems can be solved with repeated use of
distance measure. Examples of such tasks include similarity search, clustering,
classification, anomaly detection and segmentation. For over two decades it has
been known that the Dynamic Time Warping (DTW) distance measure is the best
measure to use for most tasks, in most domains. Because the classic DTW
algorithm has quadratic time complexity, many ideas have been introduced to
reduce its amortized time, or to quickly approximate it. One of the most cited
approximate approaches is FastDTW. The FastDTW algorithm has well over a
thousand citations and has been explicitly used in several hundred research
efforts. In this work, we make a surprising claim. In any realistic data mining
application, the approximate FastDTW is much slower than the exact DTW. This
fact clearly has implications for the community that uses this algorithm:
allowing it to address much larger datasets, get exact results, and do so in
less time
Recommended from our members
Super-Efficient Cross-Correlation (SEC-C): A Fast Matched Filtering Code Suitable for Desktop Computers
- …